fix(agent): 强化 prompt 可靠性——结果验证、budget 语义、安全边界与认识论纪律#1354
fix(agent): 强化 prompt 可靠性——结果验证、budget 语义、安全边界与认识论纪律#1354CodFrm merged 4 commits intorelease/v1.4-agentfrom
Conversation
There was a problem hiding this comment.
Pull request overview
该 PR 通过更新主 agent / sub-agent / compact 总结器的提示词文本与测试断言,强化 agent 编排可靠性(子代理结果验收、budget 语义、不可逆操作边界、以及输出置信度与验证纪律),以减少“静默失败传播”和“未验证即宣称成功”等系统性失效模式。
Changes:
- 在主 agent prompt 中补充子代理结果验收规范、依赖任务的 fallback 指引,并扩展不可逆操作确认清单(加入 userscript 场景)。
- 在 sub-agent prompt 中修正 tool-call budget 语义,并为 researcher/page_operator/general 三类角色追加输出纪律(置信度分层、动作/结果分离、取舍透明与失败诚实)。
- 更新 system prompt 相关测试,新增断言覆盖本次大部分文本变更。
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| src/app/service/agent/core/system_prompt.ts | 主/子代理系统提示词更新:safety 增强、子代理结果接收规范、依赖任务 fallback、sub-agent budget 语义修正。 |
| src/app/service/agent/core/system_prompt.test.ts | 为 system prompt 与 sub-agent system prompt 新增断言,覆盖新增段落与关键措辞。 |
| src/app/service/agent/core/sub_agent_types.ts | 为 researcher/page_operator/general 追加更严格的输出与验证规范(置信度、结果验证、失败诚实等)。 |
| src/app/service/agent/core/compact_prompt.ts | 强化 compact 总结器对“中途修正指令”记录的优先级与逐字记录要求。 |
| const SECTION_SAFETY = `## Safety | ||
|
|
||
| - **Confirm before irreversible actions**: submitting forms, making purchases, deleting data, posting content. | ||
| - **Confirm before irreversible actions**: submitting forms, making purchases, deleting data, posting content, installing or modifying userscripts. For userscripts specifically, show the script's \`@match\` patterns and a summary of what it does before installing — a userscript runs on every matching page after installation and cannot be easily recalled. |
There was a problem hiding this comment.
主 Agent 的 Safety 段新增了 userscript 的不可逆操作确认要求,但子代理 prompt 里的 SUB_AGENT_SECTION_SAFETY 仍只覆盖表单/购买/删除/发帖等场景。因为实际执行点击/安装流程更可能发生在 page_operator 子代理里,这里缺口会导致子代理在收到“安装脚本”任务时缺少同等级的安全约束。建议同步把 “installing or modifying userscripts” 纳入子代理 Safety,并明确:子代理无法 ask_user 时应停止并把 @match + 功能摘要回报给父代理等待确认。
| - **Confirm before irreversible actions**: submitting forms, making purchases, deleting data, posting content, installing or modifying userscripts. For userscripts specifically, show the script's \`@match\` patterns and a summary of what it does before installing — a userscript runs on every matching page after installation and cannot be easily recalled. | |
| - **Confirm before irreversible actions**: submitting forms, making purchases, deleting data, posting content, installing or modifying userscripts. For userscripts specifically, show the script's \`@match\` patterns and a summary of what it does before installing — a userscript runs on every matching page after installation and cannot be easily recalled. If the executor cannot ask the user directly, it must stop and report the \`@match\` patterns plus the summary back to the parent agent for confirmation instead of proceeding. |
| 3. **User Messages** | ||
| - List ALL user messages that are not tool results | ||
| - These are critical for understanding the user's feedback and changing intent | ||
| - Include any mid-conversation corrections or preference changes | ||
| - List ALL user messages that are not tool results, in order | ||
| - **Mid-task corrections are highest priority** — if the user interrupted an ongoing operation with a correction (e.g. "stop", "do it differently", "that's wrong"), record these verbatim. These messages are the most commonly lost in long conversations and the most damaging to skip: a resumed agent will repeat the exact mistake that was already corrected. | ||
| - Include preference changes, clarifications, and any instruction that overrides an earlier one |
There was a problem hiding this comment.
buildCompactUserPrompt 的 Section 3 文本发生了语义增强(强调 mid-task corrections 最高优先级、要求逐字记录),但现有 compact_prompt.test.ts 只断言包含 8 个段落标题,并未覆盖这条关键约束。建议在对应测试文件中补充断言(例如包含 “Mid-task corrections are highest priority” / “record these verbatim” 等关键短语),避免后续 prompt 回归。
Code reviewNo issues found. Checked for bugs and CLAUDE.md compliance. 🤖 Generated with Claude Code - If this code review was useful, please react with 👍. Otherwise, react with 👎. |
背景
在对 agent 行为的实际观察中,发现四类系统性失效模式:
此外,compact 摘要器在长对话中频繁丢失用户中途修正指令,导致恢复后 agent 重复已纠正的错误;并行 sub-agent 有依赖关系时,下游 agent 在上游未成功的情况下也会静默继续执行。
变更内容
system_prompt.ts新增:sub-agent 结果接收规范
在
SECTION_SUB_AGENT的### Anti-Patterns之后插入### Receiving Sub-Agent Results段,明确要求:Issues字段,有问题则显式决策(重试 / 换 agent / 上报用户),不得静默并入新增:并行任务 fallback 指引
在
### Writing Sub-Agent Prompts末尾追加:若 sub-agent 依赖上游输出(如 OPFS 文件),必须在委托 prompt 中写明输入缺失时的 fallback 行为,不得假设上游已成功。扩展:不可逆操作确认清单
SECTION_SAFETY第一条在posting content后追加installing or modifying userscripts,并说明原因:userscript 安装后在所有匹配页面持续运行,安装前须展示@match模式与功能摘要供用户确认。sub_agent_types.tsSUB_AGENT_SECTION_TOOL_USAGE:budget 语义修正将旧措辞:
替换为:
明确 sub-agent budget 仅针对当前子任务,与主 agent 独立,消除"省 budget"导致的过早放弃行为。
researcher.systemPromptAddition:置信度分层输出末尾追加规范:输出中须区分三类信息——
不得将三者混入单一叙述,主 agent 需要可区分的置信度信号才能正确决策。
page_operator.systemPromptAddition:动作与结果分离末尾追加规范:「点击了提交按钮」与「表单已成功提交」是两个不同事实。每次操作后须通过
get_tab_content或execute_script验证结果,无法确认时如实说明,不得推断为成功。general.systemPromptAddition:选择透明度与失败诚实末尾追加规范:存在多种可行方案时须简述取舍理由;方案失败时报告为失败,不得包装成"部分成功"。
compact_prompt.tsbuildCompactUserPrompt:中途修正指令优先级Section 3
User Messages改为:影响范围
system_prompt.tssub_agent_types.tscompact_prompt.tssystem_prompt.test.ts所有改动均为 prompt 文本,不涉及架构调整、新 agent 类型或 TypeScript 运行时逻辑。
测试
新增断言已覆盖本次所有文本变更,包括:
### Receiving Sub-Agent Results段及关键措辞covers this subtask only);旧措辞(Use them wisely)已断言不再出现installing or modifying userscripts及@match说明Mid-task corrections are highest priority及逐字记录要求